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Abstract 

This paper presents lossless prefix codes optimized with respect to a pay-off criterion consisting of a 
convex combination of maximum codeword length and average codeword length. The optimal codeword 
lengths obtained are based on a new coding algorithm which transforms the initial source probability 
vector into a new probability vector according to a merging rule. The coding algorithm is equivalent to 
a partition of the source alphabet into disjoint sets on which a new transformed probability vector is 
defined as a function of the initial source probability vector and a scalar parameter. The pay-off criterion 
considered encompasses a trade-off between maximum and average codeword length; it is related to 
a pay-off criterion consisting of a convex combination of average codeword length and average of an 
exponential function of the codeword length, and to an average codeword length pay-off criterion subject 
to a limited length constraint. A special case of the first related pay-off is connected to coding problems 
involving source probability uncertainty and codeword overflow probability, while the second related 
pay-off compliments limited length Huffman coding algorithms. 

I. Introduction 

Lossless fixed to variable length source codes are usually examined under known source proba- 
bility distributions, and unknown source probability distributions. For known source probability 
distributions there is an extensive literature which aims at minimizing various pay-offs such as 
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the average codeword length [|2[ Section 5.3], the average redundancy of the codeword length 
(3j, Q, the average of an exponential function of the codeword length |5)-[(7|, the average 
of an exponential function of the redundancy of the codeword length Q, [|7J, [[8j, and the 
probability of codeword length overflow [|9j, [10|. On the other hand, universal coding and 
universal modeling, and the so-called Minimum Description Length (MDL) principle are often 
examined via minimax techniques, when the source probability distribution is unknown, but 
belongs to a pre-specified class of source distributions [[3j, fTTfl-p"4]l. With respect to the above 
pay-offs Shannon codes find sub-optimal code lengths by treating them as real numbers, while 
Huffman codes find the optimal code lengths by treating them as integers. Coding algorithms 
for general pay-off criteria involving pointwise redundancy, average exponential redundancy, and 



maximum pointwise redundancy are found in [15|. 

The main objectives of this paper are to introduce a new pay-off criterion consisting of a convex 
combination of the maximum codeword and average codeword length, to derive lossless prefix 
codes, to discuss the implication of these codes to variable length coding applications, and to 
identify relations of the new pay-off to other pay-offs addressed in the literature. The criterion 
considered incorporates a trade-off between average codeword length and maximum codeword 
length, which makes the new coding algorithm suitable for length sensitive coding applications. 
It is general enough to encompass as a special case some of the pay-off criteria investigated 



in the literature, such as limited-length coding [16| and coding for exponential functions of the 
codeword length, while it is easily generalized to universal coding in which the source probability 
vector belongs to a class. 

The new pay-off criterion considered is discussed under Problem [T] of Section I-A t while 
its connections to other pay-off criteria such as limited-length codes and codes obtained via 
convex combination of average and exponential function of the codeword length are discussed 
in Sections III-B and III-C[ respectively. Extensions of the new pay-off to universal codes is 
discussed in Section IIII-DI 



A. Problem Formulation and Discussion of Results 

Consider a source with alphabet X — {x\,X2, ■ ■ ■ ,%\x\} of cardinality \X\, generating symbols 
according to the probability distribution p = {p(x) : x G X} = (p{xi),p(x2), ■ ■ ■ ,p(x\x\))- 
Source symbols are encoded into D— ary codewords. A code C — {c(x) : x G X} for symbols 
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in X with image alphabet V — {0, 1, 2, — 1} is an injective map c : X — > V* , where 

V* is the set of finite sequences drawn from V. For x E X each codeword c(x) E V*,c E C 
is identified with a codeword length l(x) E Z+, where Z + is the set of non-negative integers. 
Thus, a code C for source symbols from the alphabet X is associated with the length function 
of the code I : X — >■ Z + , and a code defines a codeword length vector 1 = {l(x) : x E X} = 
(l(x\) ,l{x%) , ■ ■ ■ ,l(x\x\)) E Z^'. Since a function / : X — > Z + is the length function of some 
prefix code if, and only if, the Kraft inequality holds [|2j Section 5.2], then the admissible set 
of codeword length vectors is defined by C (z^'j = jl E Z+"' : J2 x &x D ^ — ^ n tne 
other hand, if the integer constraint is relaxed by admitting real- valued length vectors 1 E M. + , 
which satisfy the Kraft inequality, such as Shannon codes or arithmetic codes, then £ {l}^ j is 
replaced by 

£(M^) = {lGM^:^^<l}. 

x&X 

Such codes are useful in obtaining approximate solutions which are less computationally intensive 
|2} Section 5.3]. Without loss of generality, it is assumed that the set of probability distributions 
is defined by 

P(*) = {p= (p( Xl ),...,p(x\ x] )) ER 1 ? 1 : 

p(x\ X \) > 0, p(xi) <p(xj),Vi >j,(xi,Xj) E X,^2p(x) = l|. 

Unless specified otherwise, the following notation is used: log(-) = log D (-), and H(p) is the 
entropy of the probability distribution p. 

The main pay-off considered is a convex combination of the maximum codeword length and 
the average codeword length. Specifically, a parameter a E [0, 1] is introduced which weights 
the maximum codeword length, while (1 — a) weights the average codeword length, and as this 
parameter moves away from a = 0, more weight is put on reducing the maximum codeword 
length, thus the maximum length of the code is reduced resulting in a more balanced code tree. 
Such pay-off is particularly important in applications where the codeword lengths are bounded 
by a specific constant. The main problem investigated is stated below. 

Problem 1. Given a known source probability vector p E F(X) and weighting parameter 
a E [0, 1], find a prefix code length vector 1* E W + which minimizes the Maximum and Average 
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Length pay-off h a (l, p) defined by 

L a (l,p) = {a||l||oo + (1 - a) y^l(x)p(x)\, ||l||oo = maxZ(a;). (1) 

I ' * J XdX 

x&X 

The presence of the £oo norm (e.g., ||/||oo) in the pay-off L a (l, p) makes the characterization 
of the optimal real- valued prefix code, which is parametrically dependent on a E [0,1], very 
different from previously known Shannon type codes. Indeed, it is shown in subsequent sections 
that the optimal code corresponding to Problem [T] is equivalent to a specific partition of the 
source alphabet, and re-normalization and merging of entries of the initial source probability 
vector, as a function of the parameter a E [0, 1], from which the optimal code is derived. The 
single letter performance of the optimal codeword lengths {ft(x) : x E X} satisfy H(w Q ) < 
Lq,(P, p) < H(w a )+1, where w a = {w a (x) : x E X} is a new probability vector which depends 
on the initial source probability vector and the parameter a E [0,1]. As a E [0, 1] increases the 
optimal code tree moves towards the direction of a more balanced code tree while there is an 
ctmax £ [0, 1] which is the minimum value beyond which there is no compression. 
An algorithm is presented which computes the weight vector w a via partitioning of the source 
alphabet, re-normalization and merging of the initial source probability vector, for any value of 
a E [0,1], having a worst case computational complexity of order 0(n). 

Problem [TJ as suggested by one of the reviewers, can also be solved in a waterfilling-like 
fashion. For completeness and direct comparison with the methodology suggested in this paper, 
the solution to Problem [T] is included in Appendix [B} 



B. Relations to Literature 



In Section III-B it is shown that limited-length coding problems defined by minimizing the 
average codeword length subject to a maximum codeword length constraint (Problem [2]) are 
deduced from the solution of Problem [T] as a special case. This connection provides Shannon 



type codes, and compliments the recent work on limited-length Huffman codes [ 16 1. Specifically, 
given a hard constraint L \\ m E [1, oo), the problem of finding a prefix code length vector 1* E M. + 
which minimizes the Average Length Subject to Maximum Length Constraint pay-off L(l, p) 
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defined by 

L(l,p) = ^/(x)p(x), (2a) 
subject to maxl(x) < L hm . (2b) 

x£X 

for all a G [0, 1] is obtained from the solution of Problem [T] The connection is established by 
introducing a real-valued Lagrange multiplier /i associated with the constraint on the maximum 
length, via the unconstrained pay-off defined by 

A 



L(l, p, fi) = l(x)p(x) + ^(max^i) — L\ im ), fi > 

' * x£X 

xex 

= ii max \{x) + l{x)p{x) — LiLy im . (3) 

x£X ' J 

xeX 

Hence, the optimal code for limited-length codes is obtained from the optimal code solution of 
Problem 1, by substituting ii = a /(I — a), and then relating the value of the Lagrange multiplier 
with a specific value of a for which the codeword lengths will be limited by Li im . The complete 
characterization of the solution to such problems is given in Section HI-B[ which also includes 
an algorithm. 



In Section III-C it is shown that Problem [T] is also related to a general-pay off consisting of a 



convex combination of the average codeword length and average of an exponential function of 
codeword length (Problem [3]) defined by 

L tjQ (l, p) ^ j log ( P(x)D U{x) ) + (1 - «) E ( 4 ) 
xex xex 

where t G (— oo, oo) is another parameter. Specifically, by noticing that | log ^2 xeX p(%)D tl( - x ^ is 
a nondecreasing function of t G [0, oo), and lim^oo | log ^ x(iX p{x)D tl ^ = max^g^ l(x), then 
by replacing amax x6 ^I(i) in L a (l, p), by the function flog \^2 xeX p(x)D tl ^j , the resulting 
pay-off takes into account moderate values below max xeX l(x), obtaining a two-parameter pay- 
off Q. The pay-off L t , a (l, p) is a convex combination of the average of an exponential function 
of the codeword length, and the average codeword length. The case a = 1 is investigated in 
[10 1, where relations to minimizing buffer overflow probability are discussed. Further, 



it is not difficult to verify that L tiQ ,| a= i(l, p) is also the dual problem of universal coding 
problems, formulated as a minimax, in which the maximization is over a class of probability 
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distributions which satisfy a relative entropy constraint with respect to a given fixed nominal 



probability distribution [14|, [17|. Hence, the pay-off L t , a | Q =i(l, p) encompasses a trade-off 
between universal codes and buffer overflow probability and average codeword length codes. 
Since the pay-off L t)Q ,(l, p) is in the limit, as t — > oo, equivalent to lim^oo h t>a (l, p) = L Q (1, p), 
Va G [0, 1], then the codeword length vector minimizing L t)Q .(l, p) is expected to converge in the 
limit as t — > oo, to that which minimizes L a (l, p). However, moderate values of t G [0, oo) are 
also of interest since the pay-off L t a (l,p) can be interpreted as a trade-off between universal 
codes and average length codes. 



Finally, in Section III-D it is demonstrated that the optimal codes obtained for the new pay-off can 
be used to solve universal coding problems, formulated as minimax problems, in which the initial 
source probability vectors belongs to a specified family of probability vectors S{X) C 
with respect to the pay-off 

L+(l, p) = max |amax/(x) + (1 — a) l(x)p(x)j. (5) 

The rest of the paper is organized as follows. Section [II] addresses Problem [T] and derives basic 
results concerning the partition of the source alphabet, the re-normalization and merging rule 
as a ranges over [0,1]. Here, an algorithm is presented which describes how the partition of 



the source alphabet is characterized. Section III gives the complete characterization of optimal 



codes corresponding to Problem [TJ the associated coding theorem, and relations to limited-length 
coding problems (Problem [2]), and coding problems with general-pay off consisting of a convex 
combination of the average codeword length and average of an exponential function of codeword 
length (Problem [3]). Section IV provides illustrative examples. Finally, Section [V] presents the 



conclusions and identifies open problems for future research. 

II. Optimal Weights and Merging Rule 
The main objective of this section is to convert the pay-off of Problem [T] into an equivalent 
objective of the form Ylxex w a (x)l(x), where the new weights w a = {w a (x) : x G X} depend 
parametric ally on a G [0, 1]. Subsequently, we derive certain properties of the new weight vector 
as a function of the initial source probability vector and a G [0,1], and identify how these 
properties are transformed into equivalent properties for the optimal codeword length vector. 
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The main issue here is to identify how symbols are merged together, and how the merging 
changes as a function of the parameter a G [0, 1] and initial source probability vector, so that 
the optimal solution is characterized for all a G [0, 1]. From these properties the optimal real- 
valued codeword lengths for Problem [T] will be found. This merging will also provide insight in 
characterizing optimal codes for related problems (with different pay-offs). 
Define 

r = max/(x), U=(xeX:l(x) = l*\. (6) 
The pay-off L a (l, p) can be written as 

L a (l, p) = al* + (1 - g) ^ l(x)p(x) = (a + (1 - a) ^p(x) J I* + - a)p(x)l(x), 

x&X x&U x&J c 

(7) 

which makes the dependence on the disjoint sets U and U c = X\U explicit. The set U remains 

to be identified so that a solution to the coding problem exists for all a G [0,1]. 

Note that /* = I* (a) and U = U(a), that is, both the maximum length and the set of source 

symbols which correspond to the maximum length depend parametrically on a G [0,1]. This 

explicit dependence will often be omitted for simplicity of notation. 

Define 

w a {x) = {^a + (1 — a) J , w a (x) = (1 — a)p(x), x G Li c . (8) 

xeu xeu 

Using ^ and ^ the pay-off L Q (l,p) is written as a function of the new weight vector as 
follows: 

L a (l, p) = L(l, w Q ) = w a (x)l(x), Va G [0, 1]. (9) 

x&X 

The new weight vector w a is a function of a and the source probability vector p G P(Af), and 
it is defined over the two disjoint sets U and U c . It can be easily verified that < w a (x) < 
1, Vx G U c and J2 x ex w a( x ) = 1 , Vet G [0, 1]. However, at this stage it cannot be verified that 

w a (x) > 0,Vx G U. 

The next lemma finds the optimal codeword length vector. 
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Lemma 1. The real-valued prefix codes minimizing pay-off 'L w (l, p) for a G [0, 1) are given by 

— log ( (1 — a)p(x) ) — — logwUx), x G U c 

1 \ x ) = \ / v ( Jr(\\ ' ( 10 ) 

\ -log( aE ^ cp( g[ E ^ p(x) ) =-log W t( x); xGW 

where U and U c remain to be identified. Note that for a = 1, X = U and ft(x) = log \X\, Va; G 



Proof: See Appendix A- A 



The point to be made regarding Lemma [I] is twofold. Firstly, since for a G [0, 1) the pay- 
off L a (l, p) is continuous in 1 and the constraint set defined by Kraft inequality is closed and 
bounded (and hence compact), an optimal code length vector F exists, and secondly the optimal 



code is given by (10). From the existence of the solution, it follows that for a G [0, 1), w a (x) > 
0, Va; G U. This can also be deduced by noticing that the pay-off L a (l, p) is positive. As a result, 
all the weights w a (x) > 0, Vx G IA\ otherwise, if there existed a negative weight w a (x), one 
could have its corresponding codeword length to be large enough to make the pay-off L a (l,p) 
negative. 

From the characterization of optimal code length vector of Lemma [T] and a well-known inequality, 
it follows that L a (l t , p) = — ^2 X&X w a (x) logu^(x) > H(w a ) and equality holds if, and only if, 
w a (x) = wl l (x),\/x G X. Therefore, for a G [0, 1) the weights satisfying ([8| and corresponding 
to the optimal code length vector are uniquely represented via w a = w^. Moreover, by rounding 
off the optimal codeword lengths via fi(x) — \— logw^x)] Kraft inequality remains valid, while 
it is concluded that H(w a ) < J2 x ex ^(x)w a (x) < i(w Q ) + 1. 

The important observation concerning prefix code length vector V G W + 1 which minimizes 
the pay-off L a (l, p) = Y^xex w a (x)l(x) is that once the weight vector w a is identified for all 
a G [0, 1), then the optimal code is given by ft(x) = — log w a (x), Va; G X and it is characterized 
for all a G [0, 1). The remaining part of this section is devoted to the problem of identifying the 
sets U and W. 

The next lemma describes monotonicity properties of the weight vector w Q as a function of the 
probability vector p, for all a G [0, 1). 

Lemma 2. Consider pay-off L a (l, p) and real-valued prefix codes. The following statements 
hold: 
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1. For {x,y} C X, if p(x) < p(y) then w a (x) < w a (y), for all a G [0,1). Equivalently, 
p(xx) > p{xq) > . . . > p(x\x\) > implies w a (x\) > w a (x2) > ■ ■ ■ > w a {x\x\) > 0, for all 
a G [0, 1). 

2. For y G U c , w a (y) is a monotonically decreasing function of a G [0, 1). 

3. For x G U, w a (x) is a monotonically increasing function of a G [0, 1). 

Proof: There exist three cases; more specifically, 

1) x, y G U c : then w a (x) = (1 — a)p(x) < (1 — a)p(y) = w a (y), V a G [0, 1); 

2) x,y eU: w a (x) = w a (y) = w* a = min^ w a (x); 

3) x G U, y G U c (or x G W c , ?/ G W): consider the case x E U, y E U c . Since x ElA that 
means that Z(x) > l(y) and equivalently, w Q (?/) > w a (x) Then, by taking derivatives we 
have 

= - P fe) < 0, 5 E!T, (ID 
dw a (x) dw* a 



da da 




>0, xeU. (12) 



According to ( [TT] ) and ( fT2~] ), for a = 0,w Q (y)| Q=0 = p(y) > WQ,(x)| a=0 = p(x), and as a 
function of a G [0, 1), for y E U c the weight w a (y) decreases, and for x G U the weight 
w a (i) increases. Hence, since w a (-) is a continuous function with respect to a, at some 
a = a 1 , w a '(x) = w a '(y) = w* a ,. Suppose that w Q (x) ^ w a (y), for some a = a' + da, 
da > 0. Then, the largest weight will decrease and the smallest weight will increase as a 



function of a G [0, 1) according to ( [TT] ) and ( fT2~l ), respectively. As a result, the two weight 



are moving together as a single weight. 

■ 

Before deriving the general coding algorithm, the following remark is introduced to illustrate 
how the weights w a and the cardinality of the set U change as a function of a G [0, 1). 

Remark 1. Consider the special case when the probability vector p(x) G F(X) consists of 
distinct probabilities, e.g., that p(x\x\) < p(x\x\~i). The goal is to characterize the weights in 
a subset of a G [0, 1), such that w a (x\x\) < w a (x\x\-i) holds. Since U = {x\x\} and \U\ = 1, 
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then 

L a (l, p) = [a + (1 - Qf)p(x|^|)J/* + y^(l - a)p(a;)/(x) = ^ /(x)w Q (x), 

xew c x&x 

where the weights are given by w a (x) = (1 — a)p(x), x G Li c and w a (x\x\) = a+(l — a)p(x\x\) 
(by Lemma El). For any a G [0, 1) such that the condition w a (x\x\) < w a (x\x\-i) holds, the 
optimal codeword lengths are given by — \ogw a (x),x G X, and this region of a G [0,1) for 
which \U\ = 1 is 

{a G [0, 1) : a + (1 - a)p{x\ X \) < (1 - a)p(ar|Af[_i)} • 

aep,l):a<— — ) r ) r>. (13) 

Hence, under the condition \U\ = 1 (i.e., w a (x\x\) < w a (x\x\-i)), the optimal codeword lengths 

7 1 / \ i; /• ^ p(x\x\ — 1) — P( X \X\ ) l -i r ^ i r 

are given by — logw a (x),x G X for a < ol\ = 1+p ^ x i)-p(x\ X \) ' wn " e J or a — a i the form 
of the minimization problem changes, as more weights w a (x) are such that x G U, and the 
cardinality of hi is changed (that is, the partition of X into U and U c is changed). 
Note that when p(x\x\) = p(x\x\-i), in view of the continuity of the weights w a as a function 
of a £ [0, 1), the above optimal codeword lengths are only characterized for the singleton point 
a — ai = 0, giving the classical codeword lengths. For a G (ai, 1) the problem should be 
reformulated to characterize its solution over this region for which \U\ ^ 1. For example, if we 
consider the case for which a > a\ and \U\ — 2 the problem can be written as 

L a (l,p) = (a + (1 - a)(p(x\x\ + p(x\ X ^ 1 ))jl* + - a)p(x)l(x) = y^J(x)w a (x). 

x£U c x£X 

For any a G [ai, 1) such that the condition w a (x\x\-i) < w a { x \x\-2) holds, the optimal codeword 
lengths are given by — log w a (x),x G X and this region is specified by 

, , a + ( X ~ a )(p( x \x\ +p(x\ X \-i)) , w 
a G [a u 1) : ^bl\ < (1 - a)p(x\x\-2) 



Equivalently, 



a e [0, 1) : a, < a < - ™*™-' ) 7 ^ ± ^ . (14 ) 

1 + |W|p(ar[^|_ 2 ) - {p{x\x\) + p{x\x\-i)) 
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Next, the merging rule which described how the weight vector w a changes as a function of 
a E [0, 1) is identified, such that a solution to the coding problem is completely characterized 
for arbitrary cardinality \U\, and not necessarily distinct probabilities, for any a E [0, 1). Clearly, 
there is a minimum a called a max such that for any a E [a max , 1] there is no compression. This 
ttmax will be identified as well. 

Consider the complete characterization of the solution, as a ranges over [0, 1), for any initial 
probability vector p (not necessarily consisting of distinct entries). Then, \U\ E {1, 2, ... , — 1} 
while for \U\ = \X\, a E [a max , 1], there is no compression since the weights are all equal. 
Define 

a k = min {a E [0, 1) : w a (x\ X \-(k-i)) = w a (x\ X \-k)} , k E {1, . . . ,\X\ — 1}, a = 0, 
Aa k = a k+1 - a k . 

By Lemma [2] the weights are ordered, hence ol\ is the smallest value of a E [0, 1) for which the 
smallest two weights are equal, w a (x\x\) = w a {x\x\-\)\ «2 is the smallest value of a E [0,1) 
for which the next smallest two weights are equal, w a (x\x\-i) = w a (x\x\-2) an d so forth, 
and a\x\-i is the smallest value of a E [0, 1) for which the two largest weights are equal, 
w a (x2) = w a (xi). For a given value of a E [0, 1), define the minimum over x E X of the 
weights by w* a = min xe ^ w a (x). 

Since for k — 0, w ao (x) = w (x) = p(x),Vx E X, is the set of initial symbol probabilities, let 
Uo denote the singleton set Specifically, 



Similarly, IA\ is defined as the set of symbols in {zi^i-i, x\x\ \ whose weight evaluated at a% is 
equal to the minimum weight w*: 




(15) 




(16) 



In general, for a given value of a k , k E {!,... , \X\ — 1}, define 




(17) 
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Lemma 3. Consider pay-off h a (\, p) and real-valued prefix codes. For k E {0, 1, 2, . . . , \X\ — 1} 

«>a(z|#|-k) =w a 0p|#[) = <, « e [a k ,a k+ i) C [0, 1). (18) 
Further, the cardinality of set Uk is \U k \ = k + 1, k £ {0,1,2, ... , \X\ — 1}. 
Proof: The validity of the statement is shown by perfect induction. 

Firstly, for a = oci : w a (x\ X \) = w a (x\ X \-i) < w a {x\ X \_ 2 ) < < w a {?i)- 
Suppose that, when a = oc\ + doc E [0, 1), doc > 0, then w a (x\ X \) ^ w a (x\ X \-i). Then, 
L Q (1, p) = (a + (1 - a)p(y))z* + ^ (1 - oc)p{x)l{x), 

and the weights will be of the form w a (x) = (l — a)p(x) for x E U{ and w a (y) = a + (l — a)p(y) 
for y E U\ = The rate of change of these weights with respect to a is 

dw a (x) 



da 
dw a (y) 



-p(x) < 0, x E U{, (19) 
l-p{y) >0, y eWi. (20) 



9a 

Hence, the largest of the two would decrease, while the smallest would increase and therefore 
they meet again. This contradicts the assumption that w a (x\ X \) ^ w a {x\ X \-x) for a > ct\, because 



otherwise one of the weights would be smaller and it should be increased with a as in ( |20| ). 

Therefore, w a {x\ X \) = w^x^x), Wa E [ax, 1). 

Secondly, for a > a k , , k E {2, . . . , \X\ — 1}, suppose the weights are 

W a {x\ X \) = W a (X|^|_i) = . . . = W a (x\ X \_ k ) = W* a . 

Then, the pay-off is written as 

L Q (l,p) = [ a + (1 - a) p(x)jr + ^(1 - a)p(x)l(x), a E (a k , 1). 

Hence, 

OW ( X) 

^^ = - J9 ( x )<0, xEUl, aE(a k ,\), (21) 
oa 

duo* 

\U k \-^ = l- }2p( x ) >0 > seWfe, ae(a fc ,l). (22) 
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Finally, in the case that a > a k+ \, k G {2, . . . , \X\ — 2}, if any of the weights w a (x), x G U k , 
changes differently than another, then, either at least one probability will become smaller than 
others and give a higher codeword length, or it will increase faster than the others and hence 
according to ( |2~T] ), it will decrease to meet the other weights. Therefore, the change in this 
new set of probabilities should be the same, and the cardinality of U increases by one, that is, 
\U k+l \ = \k + 2\, k G {2,...|AT| -2}. ■ 

Based on the results of Lemmas [2] and [3j the next theorem describes how the weight vector 
w a changes as a function of a G [0, 1) so that the solution of the coding problem can be 
characterized. 



Theorem 1. Consider pay-off h a (\, p) and real-valued prefix codes. 
For a G [a k , atfc+i), k G {0, 1, . . . , \X\ — 1}, the optimal weights 

w£ = W a (x) : x G X} = (^(xi),^(x 2 ),...,wj,(x^|)), 

are given by 

{(l-a)p(x), xeU£ 
. , ,E. eMfi p(*) 7 , W 
w a k +{a- a k ) — ^ , x £ U k , 



where U k is given by ( [17] ) and 

a k+1 = a k + (1 - a k )^ ^ . (24) 

^| +p(X|^|_(fc+l)) 

Moreover, the minimum a, called a m3X , such that for a G [a max , 1] there is no compression, is 
given by 

C^max 1 TTTj 7 \ • (25) 

\X\p{xi) 

Proof: By Lemma [3j for a G [ajt, a k +i), the lowest probabilities become equal and change 
together forming a total weight given by 

^2 w a {x) = \U k \w* a = a + (1 - a)p(x\x\) + . . . + (1 - a)p(x\ X \- k ). 
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Hence, 

\7A\ 

da 



8WI l-J2 X&Uk P( X ) Y.v&iePi?) 



da \U k \ \U k 



(27) 



By letting, 5k(a) = a — at, then 



Yltre-Uc P( x ) 

< = < + 4(«) — =f, — > xeU ^ W 

and w a (x) = (1 - a)p(x),x E U%. When S k (a)\ a = ak+1 = a k +i - a k , then w ah+1 (x\ X \-(k+i)) = 
< k+ i> and 

(1 - a k+l ) p{x\x\-{ k+ i)) = w ak + d k {a k+ i) — . 

|Mfc| 

After some manipulations, a k+ i is given by 

ttfe+i = a fc + (1 - a fc ) T — —t^ . (29) 

— ^1 — +p(a;|^|_( fc+ i)) 

When there exist no compression all the weights are equal. Hence, 

max I A? I I AT 



The minimum a beyond which there is no compression is the a at which all the weights 
become equal for the first time. This is the case when (1 — a max )p(xi) = to* maj or equivalently 



dm ax 

1 



l*|p(si)* 

Theorem [I] facilitates the computation of the optimal real-valued prefix codeword lengths vector V 
minimizing pay-off L Q (1, p) as a function of a E [0, 1) and the initial source probability vector 
p, via re-normalization and merging. Specifically, the optimal weights are found recursively 
calculating a k ,k E {0,1, ... ,\X\ — 1}. For any specific a E [0, 1) an algorithm is given next, 
which describes how to obtain the optimal real-valued prefix codeword lengths minimizing pay- 
off L a (l,p). 
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A. An Algorithm for Computing the Optimal Weights 

For any probability distribution p G ¥{X) and a G [0, 1) an algorithm is presented to compute 
the optimal weight vector w a of Theorem [T] By Theorem [T] (see also Fig. [T] for a schematic 
representation of the weights for different values of a), the weight vector w a changes piecewise 
linearly as a function of a £ [0, 1). The value of a max is also indicated. 



w Qo (a;) w ai (a;) w Q2 (a-) w Q3 (a;) 




Aai AQ2 Act3 

Weight a e [Q, 1) 



Fig. 1. A schematic representation of the weights for different values of a. 



Given a specific value of a G [0, f), in order to calculate the weights Wa(x), it is sufficient to 



determine the values of a at the intersections by using ( |24| ), up to the value of a for which the 
intersection gives a value greater than a, or up to the last intersection (if all the intersections 
give a smaller value of a) at a max beyond which there is no compression. For example, if 
ol\ < a < ct2, find all ct's at the intersections up to and including ct2 and subsequently, the 
weights at a can be found by using ( f23| ). Specifically, check first if a > a max - If y es > then 
the weights are equal to If a < a max , then find ai, ...,a m , m G N, m > 1, until 

OL m -\ < a < a m . As soon as the ct's at the intersections are found, the weights at a can be 
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found by using ( |23| ). The algorithm is easy to implement and extremely fast due to its low 
computational complexity. The worst case scenario appears when a\x\-2 < & < awx = ot\x\-i, 
in which all a's at the intersections are required to be found. Note that, if a is closer to a max , 
then it is easier to find a max first and then to implement the algorithm backwards. In general, 
the worst case complexity of the algorithm is 0(n). The complete algorithm is depicted under 
Algorithm [T] 
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Algorithm 1 Algorithm for Computing the Weight Vector w Q for Problem [T] 

initialize 

P = {pM,p(x 2 ), ■ ■ ■ ,p(x\ X \)), a = a 
k = 0, a = 0, a max = 1 - l/(\X\p(xi)) 
if a > a max then 

return wt = l/\X\, Va; e X 



end if 

while a k < a < a max do 

Calculate a k+1 : 

. Mx\x\-{k+i)) -p{x\ X \-h) 
a k+l = a k + (1 - a fc )^ — 

fefl + P(S|*|-(k+l)) 

k <- k + 1 
end while 

fc <- Jfe - 1 
Calculate wt: 

for v = 1 to - (Jfe + l) do 

= (i - <*)pM 

V ^r- V + 1 

end for 

Calculate iu?: 

w (a) = (1 - a k ) p{x\ X \- k ) + {a- a k ) — — 
for v = | X | — k to | X | do 

U •<— 1> + 1 

end for 
return wt 



III. Optimal Codeword Lengths 

This section presents the complete characterization of the optimal real-valued codeword length 
vectors 1 G C ^r!^'^ of the pay-offs stated under Problem |IJ Further, a coding theorem is derived 
and relations to limited length coding and coding with general pay-off criteria are described. 
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The related problems are stated under Problem |3} Problem |2j Finally, the application of the 
new codes in the context of universal coding applications in which the source probability vector 
belongs to a specific class is discussed. 



In view of Lemma [TJ (and the discussion following it) and Theorem [T] the main theorem which 
gives the optimal codeword length vector is presented. 

Theorem 2. Consider Problem^for any a G [0,1). The optimal prefix code 1^ G r!^' minimizing 
pay-off L Q (l,p) is given by 



log ((1 - a)p(x)\ = w a (x), 



x G U, 



_1 °g(^ )=w a {x), x eU k . 

Here a G [a*., afc+i) C [0, 1), G {1, . . . , — 1}, and ak,ak+i are found from Theorem^ 

Proof: (31 ) follows from Lemma[T]while the specific a G [a*, a^+i) follow from Theorem[T] 



Note that for a = Theorem [2] corresponds to the Shannon solution l sh (x) = —\ogp(x), 
while for a G [a max , 1) the weight vector w a is identically distributed, and hence /„(x) = pL. 
The behavior of w Q (x) and /jj(x) as a function of a G [0, 1) is described in the next Section 
via illustrative examples. Clearly, by rounding off the optimal codeword lengths via fi(x) — 

\—\og(wl t (x))] then H(w a ) < Y. x &x l t (x)w a (x) < i(w Q ) + 1. Note that one may fix the 



minimum or maximum lengths in (31) and find the value of a G [0,1) which gives these 



specific lengths. This observation will be discussed in detail in Section III-B 

The following proposition shows that the optimal pay-off is non-decreasing and concave function 

of a. 

Proposition 1. The optimal pay-off h a (tf ,p) is non-decreasing concave function of a G [0, 1). 



Proof: See Appendix A-B 



A. Coding Theorem 

This section proves a coding theorem by considering sources which generate symbols inde- 
pendently. Let X n = x" =1 A? denote the nth extension of the source which generates symbols 
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in X n independently according to p G S(X) (e.g., the extension source is memoryless). A 
typical realization of the nth extension source x n G X n is an n-tuple of the form x n = 
(xi x , Xi 2 , . . . , Xi n ), G X, 1 < j < n. Since the symbols are independently generated then 
p(x n ) = p(x il )p(xi 2 ) . . .p(x in ). Let l(x n ) denote the length of some uniquely decodable code 
for a given realization x n G X n . Then, the maximum and average length pay-off for such n— tuple 
sequences x n is defined by 

L2(l,p)^anu«J(s») + (l-a) £ /(*>(*") 

= (a + (l-a) ^p(^))r+ £ (1 - a)p(x B )Z(x B ) 
= w a (x n )l(x n ), ae [0,1), 

where Z* = max^^n l[x n ),U n = {x n G A" 1 : Z(a; n ) = Z*}, ^ n = U n UU n ' c , and E^ew™ w Q (x™) 
ct + (1 — a) J2 x n eUn p(x n ), w a (x n ) = (1 — a)p(:r n ), a; n G W n,c . Let l(x n ) be the integer length 
vector which satisfies 

- \ogw a (x n ) < l{x n ) < -\ogw a (x n ) + 1 (32) 

where 

J (1 - a)p(x n ), x n G U n ' c 

^ \U n \ ' fc • 

Then the maximum and average length pay-off per source symbol ^L™(1, p) satisfies 

-W(w a (x n )) < -L«(l, p) < -M(w a (x n )) + -. (34) 
n n n n 

Hence, by choosing n sufficiently large then p) can be made arbitrarily close to the lower 

bound yW(w a (x n )). Define the entropy rate of w a (x n ) by 

H(w a ) = lim -M{w a {x n )) (35) 

n^oo n 

Then, the following coding theorem is obtained. 

Theorem 3. Consider a discrete source with alphabet X generating symbols independently 
according to p G S(A"). Then by encoding uniquely decodable sufficiently long sequences of n 
source symbols it is possible to make the maximum and average length pay-off per source symbol 
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^L™(1, p) arbitrarily close the entropy rate H(w„). Moreover, it is not possible to find a uniquely 
decodable code whose maximum and average length pay-off per source symbol p) is less 

than the entropy rate "H(w Q ). 

Proof: The first part of the theorem follows by the above discussion. The second part of 
the theorem follows from the discussion below Lemma Q] ■ 



B. Limited-Length Shannon Coding 

Note that from the characterization of optimal codes for Problem [T] one can also obtain as 
a special case the characterization of optimal codes minimizing the average codeword length 
subject to a hard constraint on the maximum codeword length, as defined below. 

Problem 2. Given a known source probability vector p G P(A*) and a hard constraint L\ im G 

\X\ 

[1, oo), find a prefix code length vector 1* G W + which minimizes the Average Length Subject 
to Maximum Length Constraint pay-off L(l, p) defined by 

A 

xGX 



L(l,p) = ^(x)p(x), (36a) 



subject to maxl(x) < Lu m . (36b) 

x£X 

Limited length coding problems are of interest in various applications, such as distributed systems 
that are delay-sensitive and require short codewords or/and fast coders with short code table size. 
It is important to note that the solution of Problem [2] does not in general give the solution of 
Problem 1 For inter- valued prefix codes 1* G z!^', the solution of Problem [2] is addressed in 
|T6j via a dynamic programming approach. This led to the so-called length-limited Huffman 
algorithm investigated extensively in the literature (for more details, see |16j and references 
therein). 

Here it is noticed that by introducing a real-valued Lagrange multiplier p associated with the 
constraint on the maximum length the unconstrained pay-off is defined by 

A Kx)p(x) + ui 

xex 



L(l, p, p) = l(x)p(x) + p(maxl(x) — L\ im ), p > 

— 4 x£X 

xex 

= p max l{x) + \^ l{x)p{x) — pLy im . (37) 



xex 

xex 
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Hence, the optimal code from Problem [2] is obtained from the optimal code solution of Problem 
1, by substituting /i = a/(l — a), and then relating the value of the Lagrange multiplier with 
a specific value of a for which the codeword lengths will be limited by L\ im . The complete 
characterization of the optimal codes and the associated coding algorithm are given next. 

Theorem 4. Consider Problem^for any a G [0, 1). The optimal prefix code l''' G M)* ' minimizing 
the pay-off h(\,p) is given by 



ll 



x 




1 — a)p(x)j , x ElA t 



|W fc | 



x EUk 



where 



r 



a 



, i/-log(p w ) < L Um , 

1 " T^Sf • * " l0 8 l 2 ^) < ^ < - log«* W )), 

k ^ ' 

a max , ^L lim = -log( E ^ | p(:E) ). 

If Ln m < — log (^^p-^> ?/?ere z'^ no feasible solution to Problem^ 

Proof: Note that pay-off L(l, p) is a convex function and the constraint set is convex, 
hence this is a convex optimization problem. By introducing a real-valued Lagrange multiplier 
fj, associated with the maximum length constraint, the augmented pay-off is equivalent to the 
pay-off Q in Problem [lj by setting // = a /{I — a). The bound on the maximum length, 
Li im , determines the value of a for which the maximum codeword length is less than L\\ m . If 
Li im is greater than or equal to the maximum codeword length for a = (i.e., for a = 0, 

|| Z || oo = — \og(p\x\)), then the maximum codeword length for all a G [0, 1) will be smaller than 

fy p(x)\ 

Ln m . If L lim is smaller than the length — log ( fyi J » f° r which there is no compression, 
then the maximum length cannot be smaller, and therefore, for a > ct max there is no feasible 
solution to Problem j^j If, however, — log (^jfp-^ < Liim < — \og(p(x\x\)), then using the 
expression for the maximum length in Theorem [2] we have 

Liim= - log ( m )• 

As a result, after manipulation a is given by 

a = l = -— . (38) 
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From (|38|), it is evident that the cardinality of U k , \U k \ and the symbols x G U k should be known, 



in order to calculate Y^xeu c P( x )- B 

Next, a new algorithm (Algorithm [2]) is introduced to calculate the complete solution of Problem 
[2} and hence the value of a and the weight vector w a such that the maximum codeword length 
is upper bounded by L 1irn . 



August 20, 2012 



DRAFT 



23 



Algorithm 2 Algorithm for Computing a and the Weight Vector w a for Problem [2] 

initialize 

P = (p{xi),p(x 2 ), ■ ■ ■ ,p{x\ X \)), k = 0, a = 0, 
^max = -log(mm XIEX p(x)), Z min = -log ( ^ x ]x\ {X) 
if L Um > / max then 
a = 

else if Li im < Z min then 

No feasible a exist, 
else 

while L hm < Z max do 
a k+1 = a k + (l- a k ) 



K k+1 = (! - a*+i)p(z|#|-(fe+i)) 

'max = -log« fe+1 ) 

k <- k + 1 

end while 

fc <- Jfe - 1 

l-\k + 1\D~ L ^ 



— jfc+1 + 



a 



for v = 1 to |#| - (Jfe + 1) do 

wt(x„) = (1 - a)p(x„) 

v ^— f + 1 
end for 

u>t = (1 - a k )p(x\ X \- k ) + (&- a k )- 



k + 1 

for u = |*| - A; to \X\ do 

V <— V + 1 

end for 
end if 
return wt 
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Even though the two algorithms [T] and [2] are similar, there exist some basic differences. 
Algorithm [T] has a certain value of a for which it tries to identify the cardinality of hi and hence, 
specify the weight vector w a . On the other hand, algorithm [2] uses the maximum length to find 
if there exist a feasible a for which the limited-length constraint is fulfilled. Then, if feasibility 
is guaranteed, the cardinality is specified by comparing the optimum lengths at the merging 
points with the specified maximum length. Therefore, given the cardinality, the corresponding 
a is specified and finally, in the same way as in algorithm [TJ the weight vector w Q is specified. 

C. General Pay -Off and Limiting Problem 

Problem [l] can be further modified by noticing that | \ogJ2 x( zxP( x )D tl ^ is a nondecreasing 
function of t G [0, oo), and lim^oo j \ogJ2 x£X p(x)D tl ^ = max xeX l(x). Hence, by replacing 
a m&x x( zx K x ) i n L Q (1, p), by the function | log ( ^2 x ex p{x)D tl ^ x A , the resulting pay-off takes 
into account moderate values below max Ig ^ l(x), obtaining a two-parameter pay-off. The pay-off 
resulting from this observation is defined next, while the solution is discussed. 

Problem 3. Given a known source probability vector p G weighting parameter a G [0, 1), 

and parameter t G (—00,00), find a prefix code length vector 1* G W + which minimizes the 
two-parameter Average of Linear and Exponential Functions of Length pay-off 'L tiQ ,(l, p) defined 
by 

L t , a (l,p) = ^tog(5>(s)tf aW ) +(1- a) (39) 
for all a G [0, 1) and t G (—00, 00). 

Although, the solution of Problem [3] will be investigated for t G [0, 00), the problem is also 
well defined for t G (— 00, 0). The above pay-off is a convex combination of the average of 
an exponential function of the codeword length, and the average codeword length. However, 
moderate values of t G [0, 00) are also of interest since the pay-off L t a (l,p) can be interpreted 
as a trade-off between universal codes and average length codes. Thus, for a fixed value of 
a G [0, 1), and since L tj0 (l, p) is non-decreasing with respect to parameter t G (0, 00), then t is 
another design parameter, which can be selected so that the average codeword length is below 
L a (l,p). 



August 20, 2012 



DRAFT 



25 



The case a = 1 is investigated in |4j-|(8j, [10|, where relations to minimizing buffer overflow 
probability are discussed. Further, it is not difficult to verify that L tjQ! (l, p)| a =i is also the dual 
problem of universal coding problems, formulated as a minimax, in which the maximization is 
over a class of probability distributions which satisfy a relative entropy constraint with respect 



to a given fixed nominal probability distribution [14|, [17|. Hence, the pay-off L t)Q! |(l, p) a =i 
encompasses a trade-off between universal codes and buffer overflow probability and average 
codeword length codes. 

Similarly as in Problem [Tj a slight modification of the two-parameter pay-off to the convex 
combination of the average of an exponential function of the pointwise redundancy and the 
average pointwise redundancy, L t Q (l + logp, p), is of interest for integer- valued codes, since 
the real- valued codes minimizing this pay-off are l*(x) = — logp(x), x G X. To the best of our 
knowledge only the special cases of a = 0, a = 1 are investigated for pay-off L tjQ (l + logp, p) 
(see @, 0|, Q, (8J). 

Theorem 5. Consider Problem^for any a G [0, 1), t G [0, oo). The optimal prefix code 1^ G 
minimizing the pay-off L, t>a (1, p) z'^ given by 

l{ a ( x ) = - lo £ f ai/ t)a (ar) + (1 - a)p(x)) , x G X, (40) 
where {is t , a (x) '■ x G X} is defined via the tilted probability distribution 

M*)= DtlUx)p ^ xex. (41) 



Proof: See Appendix A-C 



Note that the solution stated under Theorem [5] corresponds, for a = to the Shannon code, 
which minimizes the average codeword length pay-off, while for a = 1 (after manipulations) it 
is given by 

ll a=1 {x) = -j^—logp{x)+log(j2p( x )^ rt )i (42) 



Thus, (42) is precisely the solution of a variant of the Shannon code, minimizing the average of 
an exponential function of the codeword length pay-off (6J, (7J. It can be shown that 

x&X 
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where H a (p) is the Renyi entropy given by 

e "(p) = r^ lo g(E^ x ) a )' a = YTt' Ma (44) 

' x&X 



l + t 

However, for any a G (0, 1) the following system of equations should be solved. 



D-*'W = a= ^ + (l-a)p(s), Vx G Af. (45) 

Although, the solution of Problem [T] is different from the solution of Problem [3j in the limit, 
as t — > oo, the solutions should coincide, provided the merging rule on how the solution changes 
with a G [0, 1) is employed. To this end, consider the following identities. 

lim ^log ( Y p{x)D tl ^) = maxZ(x), 1^(1, p) = lim L t , a (l,p) = L a (l,p). (46) 

t->oo t V L — 4 / x£X t^oo 



x£X 



Since the pay-off L tiQ (l, p) is in the limit, as t — > oo, equivalent to lim^oo h t>a (l, p) = L Q (1, p), 
Va G [0, 1), then the codeword length vector minimizing L t)Q! (l, p) is expected to converge in 
the limit as t — >■ oo, to that which minimizes L a (l, p). To verify this claim consider the behavior 
of the optimal two parameter pay-off L a (l| w , p), for a fixed a G [0, 1) as t increases, given in 
Theorem [5j which is equivalent to 

,t / x D tl i«^v(x) 
£>-'*» = Q ^1 + (l-a)p(x), VxG*. 

Write 

xex xeu c x&A 



(47) 



Utilizing the validity of the limits under (46), in the limit as, t — > oo, (47) becomes 

D- li{x) = (1 - a)p(x), x G U c k (48) 



D-'«W = V[X) . . + (1 - a)p(z), x G W*. 



(49) 



Since p{x) = p(y),\/x,y G 14*., then (48) and ( |49| ) are the same as < [23| ). These calculations 
verify that lim^oo h t>a (l, p) = L Q (1, p), VI, and at 1 = P. The point to be made here is that the 
solution of Problem [T] can be deduced from the solution of Problem [3j in the limit as t — > oo, 
provided the merging rule on how the solution changes with a G [0,1) is employed. 
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D. Generalizations: Connections to Universal Coding 

Although, the current paper does not investigate universal coding problems, this exposition is 
included for the purpose of demonstrating that the optimal codes characterized under Problem [T] 
can be used to address problems of universal coding, having pay-off L t a (l, p) or L Q (l+log p, p), 
and probability vector p belonging to a class of source probability vectors. 
Recall that universal coding and universal modeling [18], and the so-called Minimum Description 
Length (MDL) principle and Stochastic Complexity JT9) are often examined when the source 
probability distribution p is unknown, modeled via a parameterized class p e = ^p$(x) : x G 
X, 9 G 6 C \ (8 is a parameter vector), or a non-parameterized class S(X) C ¥(X). Universal 



coding initiated in JTT| , [12|, and further investigated in pUJ , [21 1 aims at constructing a code 



for sequences of symbols generated by unknown sources, p^ or S(A'), such that as the length 
of the sequence increases, the average code length converges to the entropy of the true source 
that generated the sequence. 

When the source probability vector is not a singleton set, but a family or a class of probability 
vectors, then Problem [T] can be re-formulated to account for this generality as follows. 

Problem 4. Given a family of source probability vectors p G S(X) C F(X) and weighting 
parameter a G [0, 1), define the one parameter pay-offs as follows. 

A. Worst Case Maximum and Average Length. 

L+(l, p) = max jq max + (1 — a) l(x)p(x)j. (50) 

B. Worst Case Maximum and Average Redundancy. 

L+(l + logp, p) = max |q max (l(x) + logp(x)) + (1 — a) ( l(x)p(x) — H(p) j |. (51) 

The objectives are the following. 

• Find a prefix code length vector 1* G M. + which minimizes the pay-off Xj(l, p), 

I ?C I 

• Find a prefix code length vector 1* G M. + which minimizes the pay-off hM^ (I + logp, p), 
for all a G [0, 1). 

The universal coding problems defined above are based on minimax techniques, the minimization 
being over the codeword lengths satisfying Kraft inequality, the maximization being over the 
class of probability vectors S(X). Next it will be shown how the complete characterization of 
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the optimal codes for Problem [T] can be used to obtain a complete characterization for the above 
minimax problem, by using von Neumann's minimax (or minisup) theorem apply. Consider the 
case when S(X) is compact (closed and bounded since it is a subset of a finite dimensional 
space) and convex. Then, since the set defining the Kraft inequality in compact and convex, 

^ 1^1 

the pay-off amax xg ^ l(x) + (1 — a) J2 x &x K x )p( x ) * s convex and continuous in 1 G W + for 

a fixed p G §(<-f), and convex and continuous in p G S(X) for a fixed 1 G W + . By von 

l x I 

Neumann's minimax theorem, the minimum over 1* G W + is interchanged with the maximum 
over p G Sj(A'). Therefore, the solution of Problem [4] is characterized by maximizing over 
p G S(A'), the solution of Problem [T] On the other hand, if the compactness of the set E>(X) is 
removed, then the maximization is replaced by supremum and von Neumann's minsup theorem 
applies, hence one can interchange the minimum with the supremum utilizing again the solution 
of Problem [TJ Hence, the solution to the coding Problem [4] is within our reach and it is based 
on the solution to Problem Q] 

One may also investigate to what extend von Neumann's minimax theorem holds for the redun- 



dancy pay-off (51); for a = 1, L+(l + logp, p)| a =i» is investigated in (3J, [13|. 

IV. Illustrative Examples 

This section presents two illustrative examples of the optimal codes derived in this paper, 
with emphasis on the merging rule which partitions the source alphabet X into U and U c as a 
function of a G [0, 1). 

A. Optimal weights for all a G [0, 1) 

Consider binary codewords and a source with 1^1=4 and probability distribution 

8 4 2 1 



P 1 15 15 15 15 



Using Algorithm [I] one can find the optimal weight vector w£, for different values of a G [0, 1) 
for which pay-off ([T]) of Problem [T] is minimized. Computing « x via ([24]) gives ol\ = 1/16. For 
a = ai = 1/16 the optimal weights are 

wl(a) = w\(a) = (1 - a)p 3 = - 
w\(a) = (1 - a)p 2 = ^ 
w \(a) = (1 -a)pi = - 
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In this case, the resulting codeword lengths correspond to the optimal Huffman code. The weights 
for all a E [0, 1) can be calculated iteratively by calculating ah for all k E {0, 1, 2, 3} and noting 
that the weights vary linearly with a (Figure [2]). 



Weights for different values of a 




Parameter a 

Fig. 2. A schematic representation of the weights for different values of a when p=(^,^,^,i). 

Given the weights, we transformed the problem into a standard average length coding problem, 
in which the optimal codeword lengths can be easily calculated for all ct's and they are equal to 
\—\og(w a (x))],Vx E X. The schematic representation of the codeword lengths for a E [0,1) 
is shown in Figure |3j 



August 20, 2012 



DRAFT 



30 



Codeword lengths for different values of a 



cn 
c 

CD 



CD 

o 
O 




0.4 0.6 

Parameter a 



Fig. 3. A schematic representation of the codeword lengths for different values of a when p — j^, jjr, jg). 



From Figure[4]it is verified that the optimal pay-off function is non-decreasing concave function 
of a E [0, 1) and at a 3 = a max = 0.53125 the cost function remains unchanged. 



Optimal cost function for different values of a 




0.4 0.6 

Parameter a 



Fig. 4. A schematic representation of the multiobjective function for different values of a when p — , ^g, yg) 
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B. Limited-length coding examples 

Consider binary codewords and a source with = 8 and probability distribution 



P 



J__L_2__2__2_A_5__9_ 
26 26 26 26 26 26 26 26 



Using Algorithm [2] one can find the value of a for which the codeword length is less than or 
equal to L\ im . Hence, the optimal weights and codeword lengths 1* for the given a can be 
found. 

Consider, for example, the case L lim = 5; then it can be shown that L\ im > — log(l/26) and 
hence the solution to the problem is the standard Shannon coding with a = 0. This can also be 
inferred from Figure [6| Consider the case when the maximum length is 4 (e.g., L\ ira = 4); then 
a = 0.0521 and the optimal lengths are 

l j =- ( 1.61 2.46 2.78 3.78 3.78 3.78 4 4 



The average codeword length is 2.6355. 




0.3 0.4 0.5 o.e 

Parameter a 



Fig. 5. A schematic representation of the weights for different values of a when p — 



Consider the case Ly im = 3; since \X\ = 8, there is no compression and all codeword lengths 
are equal to 3. In this case, a = 0.6389, is the minimum a for which there is no compression. 
This can be seen in Figures [5] and |6j 
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- 

& = 0.6389 

JzL I I I 

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Parameter a 

Fig. 6. A schematic representation of the codeword lengths for different values of a when p = (^j^j^i^ji^jj^pjgj^j). 

Consider the case L lim < 3; then there is no a for which the maximum length will be equal 

Aim- 

V. Conclusion and Future Directions 

The solution to a lossless coding problem with a pay-off criterion consisting of a convex 
combination of average and maximum codeword length is presented. The solution consists of a 
re-normalization of the initial source probabilities according to a merging rule. Several properties 
of the solution are introduced and an algorithm is presented which computes the codeword 
lengths. The formulation and solution of this problem bridges together an anthology of source 
coding problems with different pay-offs; relations to problems discussed in the literature are 
obtained, such as, limited-length coding and, coding with exponential function of the codeword 
length. Illustrative examples corroborating the performance of the codes are presented. 

The identification of a Huffman-like algorithm which solves the problem using integer- valued 
codeword lengths is left for future investigation, although it is believed that such an algorithm 
can be found based on the insight gained in this paper. 
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Appendix A 
Proofs 

A. Proof of lemma [7] 

By introducing a real-valued Lagrange multiplier A associated with the constraint, and using the 
partition X = U U U c , the augmented pay-off is defined by 



l(x) _ 1 



L a (l, p, A) = a/* + (1 - a) /(x)p(x) + A [ ^ D- 

xex \xex 

= J2( al * + ( X - <*)l(x))p(x) + A [ J2 D ~ 
xex \xex 

= (a + (1 - a) $>(x))r + £ (1 - a)p{x)l{x) + A ( ^ ^ + E D ~ 

(52) 

The augmented pay-off is a convex and differentiable function with respect to 1. Denote the 
real- valued minimization of ( f52| ) over 1, A by 1^ and A*. By the Karush-Kuhn-Tucker theorem, 
the following conditions are necessary and sufficient for optimality. 

^yL Q (l,p,A)|/ 1=lt , A=A t = 0, (53) 

J2d~ iHx) -1 < 0, (54) 

xex 

At - (y,D-*W-i\ = 0, (55) 

\x£X J 

A f > 0. (56) 
Differentiating with respect to 1, when x G U and x EU C the following equations are obtained. 

L a (l,p,A)| 1=1 t iA=A t = (l-aMx)-At J D-' t Wlog eJ D = 0, x eW (57) 

L Q (l,p, A)| 1=1 t, A =At = a J2 P( x ) + Z)p(^) - A^/T^lcg^ = 0, x G U. (58) 

^ ' x&A c x£U 

When At = 0, ( fTTj ) gives (1 — a)p(x) = 0, Vx G W c . Since p(x) > then necessarily a = 1. This 
is the case when there is no compression, since U = X. For a G [0, 1) then necessarily A^ > 0. 
Therefore, by restricting a G [0, 1) then ( |77| ), (|V8]) are equivalent to the following identities. 

r"-' = fil°)f, x eW, (59) 

A T log e 19 



dl(x) 
d 
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Next, is found by substituting f79] ) and ( |60| ) into the Kraft equality to deduce 



J2 D 



-It (a) 



:reW c x&A 



-It(x) 



E 



1 - a)p(ar) , ^ a £ x6WC + £ agW p(z) 



At log. L> 



+ E 

x&A 



At|W|log e D 



At log e D 



At|W|log eJ D 



At W D 



Atlog e D 



1. 



Therefore, At = lo 1 D - Substituting At into (|79|) and pty yields 



D 



-It (a,) 



(1 - a)p(ac), 

|«| 



x eU c 
, x eU. 



Finally, from the previous expression one obtains 

— log ^(1 — a)p(x) J , x G IA C 

-log( aE ^ cP( ff E ^ pW ), XGW. 



B. Proof of Proposition [7] 
Consider the optimal pay-off 

L Q (lt,p) =L(lt,w £ 



^2,w a {x)l ] a {x) = - ^ w a (x) \og(w a (x)). 
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Differentiating with respect to a 

dL(l\w a ) _ d(w a (x)log(w a (x))) 
da ' da 

xGX 



^ ( dw a (x) d\og(w a (x))\ 

V (w' a (x)\og(w a (x)) +w a (x) V ^\\og D e J , where <(a;) = 



log(w a (x)) - log D e ^ 



=o 

-1 + ^p(x) 1 log«) + P( x ) lo S (( X ~ a M x )) 
Multiplying both sides by (1 — a) then 

(1 - a ) dL(l ,w a ) = _ log ^*^ + i og ( w *) + ^ Wa (x)log (u) (i)) 



x&A c 



da 

= - \og(w* a ) + u; a (x) log (w a (x)) 



x&X 

'w a (x) 



Yw a (x) (-log«) + log (w a (x))) = ^w Q (a;)log 

xgx xex 



w* 



Since w a (x) > w* a then log (^^) > and therefore, 8L(1 g a Wa) > 0. Hence, L(l t ,w Q ) is a 
non-decreasing function of a G [0, 1). The second derivative of h(l\w a ) is 

v xTu w « V(i-«M^)y 



9a 2 



1 9 2 L(lt, wj _ (1 - E, eW P(*)) 2 V p(x) 



log D e 9a 2 a + (1 - a) J2 x euP( x ) ( x - a ) 

Consequently, L(l^, w a ) is a concave non-decreasing function of a G [0, 1). Note that 9 ^g^r^ = 

9L(1 g a Wa) = 0, when w Q (x) = <, Vx G 

C. Proof of Theorem [5] 

Since the two-parameter pay-off is a convex function and the constraint set is convex, this is a 
convex optimization problem. The augmented pay-off is defined by 

L t , a (l, p, A) = a- t log (Y,P(x)D* {x) ) + (1 - «) E Z (*M*0 + A ( E D " ,(iB) " 1 I • (61) 

xeA' xex \xex 
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The augmented pay-off is a convex and differentiable function with respect to 1. Denote the 
real- valued minimization of ( |6"Tj ) over 1, A by 1* and A*. By the Karush-Kuhn-Tucker theorem, 
the following conditions are necessary and sufficient for optimality. 





dl(x 



-L t>a (l,p, A)| 1=lt 



A=At 



0. 



Y^D-*to-l < 0, 



■m*) _ 1 1 = o, 

A f > 0. 



Differentiating (62) with respect to 1 then 



(62) 
(63) 

(64) 
(65) 



a 



dl(x 



-Lt >a (l,p, A)| x=1 t 



A=At 



'.l-a) 



+ (1 - a)p(x) 



When A^ = 0, then (f66|) gives 



a 



D tlt ^p(x) 



A t D- Jt ^log P D = 0, \fxeX. 



[1 - a)p{x) =0, Wx e X. 



(66) 



Summing over X then a + (1 — a) = 0, which is impossible, hence necessarily A^ > 0. 
Consequently, the Kraft inequality holds with equality ^2 xeX D~ l ^ = 1. Summing over X on 



both sides of ( 66 ) gives A* 



i 



log e D- 



Substituting A" 1 " into (66) gives the following set of equations 



that describe the optimal codeword lengths. 



D -iHx) 



a 



X 



+ (l-a)p(x), WxeX. 



(67) 



Consequently, the optimal codeword lengths are given by (40). 
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Appendix B 
Waterfilling-like solution of Problem Q] 

The pay-off L* f °(l, p) is a convex combination of the maximum and the average codeword 
length. The problem can be expressed as 

minmin + (1 — a) (68) 

subject to the Kraft inequality and the constraint l(x) < t \fx E X, where t = max xg ^ l(x). 
By introducing real-valued Lagrange multipliers X(x) associated with the constraint l(x) < 
t Vx G X and a real-valued Lagrange multiplier v associate with the Kraft inequality, the 
augmented pay-off is defined by 



L Q (1, p, A, u)=at+(l-a)J2 K^)p{x) + u t ]T D~ 1 ^ 

x£X \x£X J x&X 



\(x){l(x) -t) 



The augmented pay-off is a convex and differentiable function with respect to 1 and t. Denote the 
real-valued minimization over 1, t, A, v by It, t\ A^ and zA By the Karush-Kuhn-Tucker theorem, 
the following conditions are necessary and sufficient for optimality. 

d 

— -^L a (l, p, t, A, i/)| t =it,A=At,t=tt, v = v t = 0, (69) 
d 

— L a (l,p,t, A, i/)| t=1 t,A=At,t=tt, w =i,t = 0, (70) 
J2d~ iHx) -1 < 0, (71) 

xEX 



z/t. (j2D~ l 'M-l\ = 0, (72) 
\xeX ) 

u ] > 0, (73) 

l\x)-t < 0,VxeX, (74) 

A f (x) • (ft(x) -t) = 0, Vx G (75) 

A f (a;) > 0,VxG^. (76) 

Differentiating with respect to 1, the following equation is obtained: 

J^yMl, P, A, z/)| 1=lt , A=A t, 4= it,, = ,t = (1 - a)p(x) - iJD-*W log e D + A%) = 0, (77) 
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Differentiating with respect to t, the following equation is obtained: 

d 

-L Q (1, p, A, ^)| 1= it )A=Atji=t t )I ,= i /t = a - ^2 At ( x ) = °- 



dt 



(78) 



When z/ f = 0, {77]) gives (1 - a)p(x) + A f (x) = 0, Va; G A\ Since > and \ j (x) > then 
necessarily a = 1. This is the case when there is no compression. For a G [0, 1) then necessarily 
z/t > 0. 

By restricting a G [0, 1) then {77]) and {78] ) are equivalent to the following identities: 

-jto*) _ (l-aMzl + ^a;) 



z/t log e D 



5^A f (x) = a, 



x G A", 



(79) 



(80) 



Next, i/t is found by substituting {79]) and ( |80| ) into the Kraft equality to deduce 



z/t log e D 



i/t log e £) 
'1 — a) + a 



z/t log e 7J 



i/t log e .D 
1 

i/t log e D 



Therefore, 1/t = ^ D . Substituting z/t into {79] ) yields 

/r' t(x) = (1 -a)p(x) + A f (x), xeX, 
Substituting A^(x) into ( [80] ) we have 

(D-'^ - (1 - a)p(x)) =«. 



(81) 



(82) 



Let u;t(x) = D 1 ( x \ i.e., the probabilities that correspond to the codeword lengths l\x)\ also, 



let w* = D 1 . Then, ([82]) can be written as 



{w\x) - (1 - a)p(x)) 



a. 



(83) 
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From the Karush-Kuhn- Tucker conditions f75| ) and ( |76| ) we deduce that for l(x) < t, X(x) = 



and equation (83) becomes 



^2 { w * ~ C 1 - a)p(z)) + = a, 



(84) 



where (/) + = max(0,/). This is the classical waterfilling equation [|2j Section 9.4] and w* is 
the water-level chosen, as shown in Figure [7] 



weight 





w 2 
















w 3 
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(l-a)p 3 
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(l-a)p 5 
















(l-a)Pe 
















(l-a)p 7 


— * 



water-level 



> symbol 



Fig. 7. Waterfilling solution of the coding problem. 



For the solution of Problem [2j for which we consider the limited-length case, w* = D Llim and 



equation (84) needs to be solved for a. 
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